Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update test read write on disk #515

Merged

Conversation

ArneDefauw
Copy link
Contributor

update unit test incremental io on disk, see #501 (comment)

@ArneDefauw ArneDefauw mentioned this pull request Mar 26, 2024
15 tasks
@LucaMarconato
Copy link
Member

Hi @ArneDefauw, thanks for the contribution. Is this still in draft mode? If so, what's missing? Thanks!

@ArneDefauw ArneDefauw marked this pull request as ready for review March 27, 2024 16:41
@ArneDefauw
Copy link
Contributor Author

I changed status of PR, there is not really something missing, but for points, there still seem to be issues with workaround 2 ,as suggested, I am not sure if this is expected or not...


with pytest.raises(
ValueError, match="The Zarr store already exists. Use `overwrite=True` to try overwriting the store."
):
sdata.write_element(name)

with pytest.raises(ValueError, match="Currently, overwriting existing elements is not supported."):
with pytest.raises(ValueError): # TODO this gives different messages for different element types
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusion, I will adjust it, but correct. Points, images and labels may be backed from disk. In that case the problem is that even with overwrite=True, we can't overwrite those elements, because overwriting them would mean to actually delete the source of the backed data.

The second error message occurs with non-backed images, labels, points or with shapes and tables (that currently are always non-backed). In that case we could overwrite the data but we don't do it because this could lead to data loss anyway (the case above is instead a "systematic data loss").

The only overwrite case that is supported is overwriting into a different Zarr store, for instance an old, non-loaded, spatialdata object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workarounds I show in the tests work for both cases; I will document them in an example notebook and eventually we could automate their execution (after warning the user that this works only under particular assumptions).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will now make the error messages less cryptic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@LucaMarconato
Copy link
Member

Thanks for reporting and for the code. The use case of wanting to overwrite lazy data is an important one and now this is considered in tests. For clarify, I have split the test into 4 subcases: workaround 1 and 2, and lazy vs non-lazy data. In this way the user can see which case to use and have some code to start with.

A few more comments:

  • the error messages, now made clearer, point to a discussion (Workaround for overwriting existing elements #520), which currently points to the test considered in this PR, to show some code to start with.
  • currently we don't allow to reload single elements, or to write the sdata/element and immediately reload it; I plan to add this in the future. Tracking it here: Allow reloading of particular elements, and add reload: bool parameter to write functions #521.
  • Finally, as now it is also being clarified in the test, the workaround 2 should not be used for lazy data. It is expected for it to fail for points, images and labels. Unfortunately the approach that you suggested for workaround 2 leads to data loss. The error given for points is expected and I was surprised it was working for images and labels. I looked into it better: somehow the xarray.DataArray doesn't throw an error when it tries to load from the Zarr store that had been deleted, but if you compute the data you will see that the tensor is now a tensor of zeros.

@LucaMarconato LucaMarconato merged commit 57dd22e into scverse:feature/incremental_io Mar 27, 2024
2 checks passed
LucaMarconato added a commit that referenced this pull request Jun 10, 2024
* implemented incremental io; tests missing

* added draft for write_metadata(); need to write new tests

* wip better explanation

* fixed bug wrong order of points columns after spatial query

* testing the copying of metadata and their inclusion in assert_elements_are_identical

* improved readwrite tests

* added tests for incremental io

* implemented write_metadata for transformations

* tests for incremental io of transformation, with separate validation for
writing metadata incrementally

* tests for IO and incremental IO of consolidated metadata

* improved control over elements only on-disk/in-memory

* added tests for delete_element_from_disk

* fix

* added _check_element_not_on_disk_with_different_type()

* updated changelog

* fixed changelog

* attempt fix docs

* Update src/spatialdata/_io/_utils.py

Co-authored-by: Kevin Yamauchi <[email protected]>

* fixes from review

* update test read write on disk (#515)

* test read write on disk

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* improved tests for workarounds for incremental io

* fixed tests

* improved comment

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Luca Marconato <[email protected]>

* Update src/spatialdata/_core/spatialdata.py

Co-authored-by: Giovanni Palla <[email protected]>

* list of names for write_element() and delete_element_from_disk()

* improved docs

* code review from Giovanni

---------

Co-authored-by: Kevin Yamauchi <[email protected]>
Co-authored-by: ArneD <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Giovanni Palla <[email protected]>
LucaMarconato added a commit that referenced this pull request Jul 24, 2024
* implemented incremental io; tests missing

* added draft for write_metadata(); need to write new tests

* wip better explanation

* fixed bug wrong order of points columns after spatial query

* testing the copying of metadata and their inclusion in assert_elements_are_identical

* improved readwrite tests

* added tests for incremental io

* implemented write_metadata for transformations

* tests for incremental io of transformation, with separate validation for
writing metadata incrementally

* tests for IO and incremental IO of consolidated metadata

* improved control over elements only on-disk/in-memory

* added tests for delete_element_from_disk

* fix

* added _check_element_not_on_disk_with_different_type()

* updated changelog

* fixed changelog

* attempt fix docs

* Update src/spatialdata/_io/_utils.py

Co-authored-by: Kevin Yamauchi <[email protected]>

* fixes from review

* update test read write on disk (#515)

* test read write on disk

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* improved tests for workarounds for incremental io

* fixed tests

* improved comment

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Luca Marconato <[email protected]>

* Update src/spatialdata/_core/spatialdata.py

Co-authored-by: Giovanni Palla <[email protected]>

* wip

* wip

* wip support for legacy spatialdata format

* before implementing the format logic in read

* all tests passing; cleanup

* cleanup format

* removed spatialdata_format_version metadata field from raster

* Update src/spatialdata/_core/spatialdata.py

Co-authored-by: Giovanni Palla <[email protected]>

* added formats to docs

* added sphinx interlink for format

---------

Co-authored-by: Kevin Yamauchi <[email protected]>
Co-authored-by: ArneD <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Giovanni Palla <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants